On the Use of Holdout Samples for Model Selection
نویسنده
چکیده
Researchers often hold out data from the estimation of econometric models to use for external validation. In this paper we examine possible rationales for this practice. For concreteness, two examples are considered. The first example (Todd and Wolpin (2008)) is taken from the microeconometrics literature. Suppose the goal is to evaluate the impact of a monetary subsidy to low-income households based on school attendance. A social experiment is conducted in which a randomly selected treatment sample of households is offered a school attendance subsidy at some level s̄, whereas no subsidy is provided to the households in the control sample. In order to determine the optimal subsidy level, it is necessary to extrapolate the subsidy effect to other treatment levels. This requires the development and estimation of models that embed behavioral and statistical assumptions (structural models). A holdout approach to the selection among competing structural models amounts to splitting a sample Y into two subsamples, Ye and Yho. The models are then estimated based on Ye, say the data from the control group, and ranked based on their ability to predict features of the holdout sample Yho, for instance the subsidy effect on the treatment group. Examples of this kind of external model validation in the context of randomized controlled trials are Wise (1985), Todd and Wolpin (2006), and Duflo, Hanna and Ryan (2011). The second example is taken from the macroeconometrics literature. In time series analysis, competing models are often ranked in terms of their performance
منابع مشابه
Model Selection Based on Tracking Interval Under Unified Hybrid Censored Samples
The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...
متن کاملA genetic programming model for bankruptcy prediction: Empirical evidence from Iran
Prediction of corporate bankruptcy is a phenomenon of increasing interest to investors/creditors, borrowing firms, and governments alike. Timely identification of firms’ impending failure is indeed desirable. By this time, several methods have been used for predicting bankruptcy but some of them suffer from underlying shortcomings. In recent years, Genetic Programming (GP) has reached great att...
متن کاملSimulation of Future Land Use Map of the Catchment Area, with the Integration of Cellular Automata and Markov Chain Models Based on Selection of the Best Classification Algorithm: A Case Study of Fakhrabad Basin of Mehriz, Yazd
INTRODUCTION Since the land use change affects many natural processes including soil erosion and sediment yield, floods and soil degradation and the chemical and physical properties of soil, so, different aspects of land use changes in the past and future should be considered particularly in the planning and decision-making. One of the most important applications of remote sensing is land ...
متن کاملEvaluation and selection of sustainable suppliers in supply chain using new GP-DEA model with imprecise data
Nowadays, with respect to knowledge growth about enterprise sustainability, sustainable supplier selection is considered a vital factor in sustainable supply chain management. On the other hand, usually in real problems, the data are imprecise. One method that is helpful for the evaluation and selection of the sustainable supplier and has the ability to use a variety of data types is data envel...
متن کاملModel Selection for Mixture Models Using Perfect Sample
We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...
متن کامل